Educative: Interactive Courses for Software Developers

It’s best to use specialized search engine technology instead of SQL. Another way to optimize is to reduce the recurring cost of search by saving the results. The following sections describe some of the technologies offered as built-in extensions by different database brands, as well as technologies offered by independent projects. We’ll also develop a solution that uses standard SQL but is more efficient on average than substring matching.

Vendor extensions#

Every major brand of a database has invented its own answer to the common requirement of full-text search, but these features are not standard or compatible between database brands. If we use a single brand (or are willing to use vendor-dependent features), these features are the best way to get high-performance text search, with the greatest integration with SQL queries.

The following are brief descriptions of full-text search features in several brands of SQL databases. The details are subject to change, so we must always read the current documentation for our brand.

Full-text index in MySQL#

MySQL provides a simple full-text index type for the MyISAM storage engine. We can define a full-text index over columns of type CHAR, VARCHAR, or TEXT. Here’s an example that defines a full-text index that includes content from the bug summary and description columns:

Enter to Rename, Shift+Enter to Preview

Altering the table to add the index

Let’s use the MATCH() function to search for a keyword among the indexed text. We must name the columns in the full-text index (so we can match using another index that covers different columns in the same table).

Enter to Rename, Shift+Enter to Preview

Using MATCH() function to search for a keyword among the indexed text

Since MySQL 4.1, we can also use a simple Boolean expression notation in the pattern to filter results more carefully.

Enter to Rename, Shift+Enter to Preview

Using boolean expression in MySQL to filter results

Text indexing in Oracle#

Oracle had supported text-indexing features since Oracle 8 in 1997 when it was part of a data cartridge called ConText. The technology has been updated several times, and the feature is now integrated into the database software. The text indexing in Oracle is complex and rich, so here is a greatly simplified summary:

`CONTEXT`#

We can create an index of type CONTEXT for a single text column. We can use the CONTAINS() operator to search using this index. The index doesn’t stay consistent with changes to data unless we define the index with PARAMETERS ('SYNC (ON COMMIT)').

Enter to Rename, Shift+Enter to Preview

Creating CONTEXT index in Oracle

`CTXCAT`#

The CTXCAT index type is specialized for short text samples such as those used in online catalogs, along with other structured columns from the same table. The index stays consistent as transactions update the indexed data.

Enter to Rename, Shift+Enter to Preview

Creating CTXCAT index in Oracle

The CATSEARCH() operator takes two arguments for searching the text column and the structured column set, respectively.

Enter to Rename, Shift+Enter to Preview

Searching text using CATSEARCH() in Oracle

`CTXXPATH`#

The CTXXPATH index type is specialized for searching an XML document with the existsNode() operator.

Enter to Rename, Shift+Enter to Preview

Creating CTXCAT index in Oracle for searching text in the XML document

`CTXRULE`#

Let’s suppose we have a large collection of documents in our database, and we need to classify them based on their content.

With the CTXRULE index, we can design rules to analyze documents and report their classification. Alternatively, we can also provide a sample set of documents with classifications and have Oracle design the rules to apply to the rest of the document collection. We can even fully automate the process, letting Oracle analyze our collection of documents and come up with a set of rules to classify them.

Full-text search in Microsoft SQL Server#

SQL Server 2000 and later support full-text searching, with complex configuration options for languages, a thesaurus, and automatic synchronization with data changes. SQL Server provides a series of stored procedures for creating full-text indexes, and we can use the CONTAINS() operator in queries to employ them.

To perform the familiar task of searching for bugs that include the word “crash”, we must first enable the full-text feature and define a catalog in our database:

Enter to Rename, Shift+Enter to Preview

Enabling full-text feature in the database and defining a catalog in the database

Next, we need to define a full-text index on the Bugs table, add columns to the index, and activate the index:

Enter to Rename, Shift+Enter to Preview

Enabling full-text feature on Bugs table, adding a column to the index, and activating the indexes

The next step is to enable automatic synchronization for the full-text index so that changes to the indexed column are propagated to the index. Then, we can begin the process of populating the index. This will run in the background, so it may take some time to complete before queries benefit from the index.

Enter to Rename, Shift+Enter to Preview

Enabling automatic synchronization for the full-text index

Finally, we must run a query using the CONTAINS() operator:

Enter to Rename, Shift+Enter to Preview

Running the query in MS SQL using CONTAINS()

Text search in PostgreSQL#

PostgreSQL 8.3 provides a sophisticated and highly configurable way of converting text into a searchable collection of lexical elements and of matching these documents against patterns.

To achieve the best performance, we need to store the content in its original text form as well as in a searchable form using the special data type TSVECTOR.

Enter to Rename, Shift+Enter to Preview

Creating Bugs table and declaring the column as TSVECTOR

We need to make sure that the TSVECTOR column is kept in sync with the content in the text column(s) we want to make searchable. PostgreSQL provides a built-in trigger procedure to make this easier:

Enter to Rename, Shift+Enter to Preview

Creating a trigger to make the TSVECTOR column sync with the content

We should also create a generalized inverted index (GIN) on the TSVECTOR column:

Enter to Rename, Shift+Enter to Preview

Creating an index in PostgreSQL

After this, we can use the PostgreSQL text search operator @@ to search efficiently, aided by the full-text index:

/

main.sql

main.sql

Enter to Rename, Shift+Enter to Preview

Searching a specific value using PostgreSQL in the Bugs table

There are many other options for customizing searchable content, search queries, and search results.

Full-text search (FTS) in SQLite#

Standard tables in SQLite don’t support efficient full-text searches, but we can use an optional extension for SQLite to store searchable text in a virtual table specialized for searching text. Three versions of the searchable text extension exist, known as FTS1, FTS2, and FTS3.

FTS extensions are not typically enabled in a default build of SQLite, so we need to build it from source with one of the FTS extensions enabled. For example, we can add the following options to Makefile.in, and then build SQLite.

Enter to Rename, Shift+Enter to Preview

Adding some options in the makefile.in before building SQLite

Once we have a version of SQLite with FTS enabled, we can create a virtual table for the searchable text. Any data type, constraints, or other column options are ignored.

Enter to Rename, Shift+Enter to Preview

Creating a virtual table BugsText for searchable text

If we are indexing text from another table (as in this example using the Bugs table), we must copy the data into the virtual table. The FTS virtual table always contains a primary key column called docid, so we can correlate rows to those in a source table.

Enter to Rename, Shift+Enter to Preview

Inserting data into the virtual table BugsText

Now we can query the FTS virtual table BugsText using the efficient full-text search predicate MATCH, and we can join matching rows to the source table Bugs. Using the name of the FTS table as a pseudo-column matches the pattern against any column.

Enter to Rename, Shift+Enter to Preview

Retrieving data from FTS virtual table BugsText using MATCH()

The matching pattern also supports limited boolean expressions.

Enter to Rename, Shift+Enter to Preview

Boolean Expressions for retrieving data from BugsText using MATCH()

Solution: Use the Right Tool for the Job

Vendor extensions#

Full-text index in MySQL#

Text indexing in Oracle#

CONTEXT#

CTXCAT#

CTXXPATH#

CTXRULE#

Full-text search in Microsoft SQL Server#

Text search in PostgreSQL#

/

Full-text search (FTS) in SQLite#

`CONTEXT`#

`CTXCAT`#

`CTXXPATH`#

`CTXRULE`#